action gap
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Banking & Finance (0.67)
- Information Technology (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States (0.14)
- North America > Canada (0.04)
- North America > Canada > Quebec > Montreal (0.40)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Banking & Finance (0.67)
- Information Technology (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States (0.14)
- North America > Canada (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment (0.68)
- Education (0.46)
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
Wiltzer, Harley, Bellemare, Marc G., Meger, David, Shafto, Patrick, Jhaveri, Yash
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases. We quantify the rate of collapse of these return distributions and exhibit that their statistics collapse at different rates. Moreover, we define distributional perspectives on action gaps and advantages. In particular, we introduce the superiority as a probabilistic generalization of the advantage -- the core object of approaches to mitigating performance issues in high-frequency value-based RL. In addition, we build a superiority-based DRL algorithm. Through simulations in an option-trading domain, we validate that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.
- North America > United States (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
van Seijen, Harm, Fatemi, Mehdi, Tavakoli, Arash
In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis, which identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Leisure & Entertainment (0.68)
- Education (0.66)
A General Family of Robust Stochastic Operators for Reinforcement Learning
Lu, Yingdong, Squillante, Mark S., Wu, Chai Wah
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > New York (0.04)
Increasing the Action Gap: New Operators for Reinforcement Learning
Bellemare, Marc G. (Google DeepMind) | Ostrovski, Georg (Google DeepMind) | Guez, Arthur (Google DeepMind) | Thomas, Philip S. (Google DeepMind) | Munos, Remi (Google DeepMind)
This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)